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SQL-BASED ANALYTIC ALGORITHMS 

CROSS-REFERFNTCE TO RELATED AP PLICATIONS 
This application claims the benefit under 35 U.S.C. Section 119(e) of the co- 
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60/102,831, filed October 2, 1998, by Timothy E. Miller, Brian D. Tate, James D. 
Hildreth, Miriam H. Herman, Todd M. Brye, and James E. Pricer, entitled 
Teradata Scalable Discovery, which application is incorporated by reference herein. 
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Brian D. Tate, James E. Pricer, Tej Anand, and Randy G. Kerber, entitled 
SQL-Based Analytic Algorithm for Association, attorney's docket number 
8219, 

AppUcation Serial No. --/---,---, filed on same date herewith, by 
James D. Hildreth, entitled SQL-Based Analytic Algorithm for Clustering, 
attorney's docket number 8220, 

Application Serial No. --/---,---, filed on same date herewith, by 
Todd M. Brye, entitled SQL-Based Analytic Algorithm for Rule Induction, 
attorney's docket number 8221, 
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Brian D. Tate, entitled SQL-Based Automated Histogram Bin Data 
Derivation Assist, attorney's 'ifocl^silSuinbfer 8222, 
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"r-'- Brian D. Tate, entitled SQL3ased Automated, Adaptive, Histogram Bin^ 
Dka Description'Assist, attorney's docket number 8223, 
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herewith, by Timothy E. Miller, Brian D. Tate, Miriam H. Herman, Todd 
M. Brye, and Anthony L. Rollins, entitled Data Mining Assists in a 
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' ' • * Data Reduction Techniques! for Delivering Data to Analytic Tools, 

attorney's docket number 8225, . 
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.a.-; Application Serial Nq. PCT/US99/ fUed^ 

.aL herewith, by Timothy E. Miller, Miriam H. Hern^?n,,and^Aathon>r L. 
.:i P^bllins, entitled Techniques for Deploying Mal)ji5,Mo4?Js,yi ?ar^el, 
V > attorney's docket niumb)^* 8226,,and j-i ji> . .'■ 

? n a Application SerM:Np,PCT/US99/-----,faed,^?i.,s^ 

herewith;,^ by^Timothy E. mer, B.nan p. Tate, and 
entitled^i\k alytic Logic4 Data Model, attorney's docket nwniber, 8227, 
dl of whichrare incorporated by,?fferent;.e, herein. , ; , 

■ Af? r;-KOTTND. OF THE TSTVENT ION . 

■ >: ;-.-v-.i.'-> i ri-, Field bfthejlnventibn^c ; «,-,■-'> i.--.^:.: ...-.,->. r.v. 
... This invention; relates;L%.^eneral tq^a^relationd dai;ab^e ma^iagement 

system, and in particular, to SQI.tb?:.s?d, analytic algorithpis Aat pr^A^de sta^^^ 
and machiAe.l^-ijixg.methQds,to .-.?ate:an4ytic models from the data residing in a 
15" ■^relational databassf oi.. j-.-^: '/ :o rail'], .r i q .. . . ' ; ..\ 

i' ".J ' I ,4 ; t,-:i ■■ '. ;f;h -jfi'l' '=.1. iO':.:'- . .: .•: - 

;- ■•'■"> ■ ; •r'2i jl- Descriptio;r. of Related; Af!t^3i K ■;: v.Tr .; -: 

^ I Relational databases ai;e the predomm^te 

' systeMs used in computer iystems.. ^lekti9n4^^.°W9^^SyW 
20 often used in so-called .7data;warehpv^e" applicat^ons^'^^here e^prmo^fs.aniounts of 
data are stored and processed, .fe.recent years, seyej^ jrends ^ave, converged to 
create Si new dass of data warehousing appUcations.kno^p ^ data mining^ 
' ' applicatibiis. Data mining is ^hfeprocess of identifying. andjnterpreting patterns in 
' databases^ and can be jgenerali^ed k*:QnthreerStages. ^ , .,v i- J v v : - ; 
25 ! '■■ Stage one is the reporting stage„w Jch analyzes the data to, determine what 
■i happened. V Generally, most data warehousel^aplementatw^^ stai;t with^^ focused 
application in a specific functional, area of the,business. These applications usually 
focus on reporting historical snap shots of busin,esynformation that was previously 
? difficult or impossible to accssji^ .Examples include SalesJ^evenue |leporting, 
30 Production Reporting and Inventory Reporting to naine a fe^RT. . . 

Stage two is the analyzing stage, which analyzes the data to determine why 
it happened. As stage one end-users gain previously unseen views of their business, 
they quickly seek to understand why certain events occurred; for.example a decline 
in sales revenue. After discovering a reported decUne in.sales, data w^ehpuse users 
35 ^ will then obviously ask,' "Why did sales, gP. down?" Learnmg the answer to this 
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question'%)ically involves probing the diatabase through an iterative series of ad 
hoc or maltidini^niibnai queries tintil tlie root cause of the; condition is discovered. 
Exinpies i'tidiicJe^^a^^^ Analysis, Inventory Analysis or Production! Ajaalysis. 
Stage three is the predicting stage, whfeh^ries to determine what will 
5 ha^pdni J^s sta^e two users become morie ^phisticated, they begin to extend their 
analysis to iadud^ p'redictioii of unkndwa^verits. For ex^unpleiv" Which end-users 
are likely to biiy a particular proilutt^J dt^** Who^is at risK:£jf Jeiving for the 
competition?" It is difflcult^for Kaaiaffis b see Qr inwrpiet subtle relationships in 
data, hence as data warehorse users evolve to sophisticated predictive analysis they 
10 soon reach the linfits4f ^nl^U'^nal qiiery and f qiiilB&^dools. Data mining helps ;. j 
end-users break through these limitations' by le:^<4bging intelligent software tools to 
sfeft'soiH6^of t^e aailysis btii-den-frbm the'^iSttaiiJtb the machiniej'effibling the 
^ discovery' {jf relate -i; h . . v . ' 

' ■ ' Many ddta maing tediridlb^fes^^S aviilafcle, froumain^fe algorithitjr. 

15 solutions to complete tool suites. Most of these technologies,: howevec, used m 
a desktop environment where little data is captiured and maintained. Therefore, 
most data mining tools are used to Snilly^ saiall (feta sainpl^i which were gathered 
" frbfh VaribuS'st^urWs iiitb prdjiiffetiiry d'atalrtraetures or /flai r On the other 
hinci/brgatft^ti'bris sffe be^Ani^^^ andjend-users are 

20' askirig more ebinpk^^^ ' - 

' UnfoWunatel^ data miriirig^tdthnblogies^Garinot be usjsd with large 
volumes of data; Putther, most analytifeai tedbiaques used in data mining are 
■'agorith'mic^b^^d rith^r'iiha^ data-drtV^iii'aaid as such, there are.cwrrently little 
synergy between data minii^ aiiS-data'#ar6houses. Moreover^ fropi a usability 
M pe^ipec^ve; f i-aditional data nliriM^ tdfchMques are too ^complex for jise by database 
' adihiiiistri'tors and appUcatibri'^bgrammers, and are too difficult to change for a 
different iMustry br a diff6t6nt icnistomer.- - ~r'-i \ r.i ; >i ) ' 

Thus, there is a nead in the art for data mining applications that direaly 

operate i^ainst^data -#ir^ouses, and that allow non-statisticians to benefit from 
30 advanced mathematical techniques available in. a relational environment. 



c' , . jy-ic ^ v : - J SUMMARY OF THE INVENTION 
3 4 a . overcbihe the limitations in the prior art described abpv€, and to 
overcome other liim^ uponireading and , 

35 ' understanding the {)resent 'Speeifieiation, the presentinvention discloses a method. 
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apparatus, and article of manufacture for performing data mining applications in a 
rektional database management system. At least on^ andytic algorithm is , 
performed by.a computer directly^ against^a relational database, wherein the analytic 
algorithm includes , SQL statem^rLtf pNerformed by the relational datab . 

5 managementi^syst^ and opjtional programmatic iteration, and the analytic^ 

algorithm creates: at least one analytic mpdel within an analytic logical data model 
from data residin&in the relational database.. ^. 

An object of thepresent invention is; to provide more efficient usage of 
parallel processor computer;Systems. ^^ gy^jt^o^^ invention is to 

10 provide a foundation for datajnining tool sets in relational database management 
systems. Fuj^her^ an obje(^.o£ tb^ invention is to aljow data„mining of large 

databases. u , - . , , 



BRIEF DESCPJP J ON bp THE DRAWnSTGS 
15 , f^eferrm% now tp the drawmgs m like reference numbers represent 

correspcfl ding parts thro^ r ^ bris £ - - 

FIG. 1 is a block dis eram that illustrates an exemplary computer hardware 
environment rhat eo^iild be used ^ith the preferred embodiment of the present 
inventipn;.,,:,- ..-^, ^ , . ' ...^ ^r ' i.x. ulI- 

20 FIG. 2 is a block diagram that illustrates an exemplary logical architecture 

. , that could be used with th^^ preferred eml^odiment of the^ present invention; and 
FIGS. 3y 4, and 5 are flowcharts that illustrate exeniplary logic performed 
according to the preferred embocjiipent pf^the present invention. 

25 r ; - DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

In the following description pf thepxeferred embodiment, reference is made 
to.the;accompaByingf drawings, which form ^ p^^ hereof, and in which is shown 
by way of illustration a specific embodiment in whi^h the invention may be 
practiced. It is to be understood that oth^r enibodinients may be utilized and 

30 K structural changers m^y be made without departing from the scope of the present 
invention. ^ 

OVERVIEW 

The present invention provides a relational database management system 
35 (RDBMS) that supports data mining operations of relational databases. In essence. 
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adSncef^^^ capabilities for data mimng appUcations arfeipla^ 

vAiere j^\Mn^, i:e./dose to the data. Moredvfer. the results of thesi^ah'alytic 
processing c^i^^^^^ persist ^-^i'ithin tHe database or can b'e exported 

from tiie ditat^aie^^ analytic processing capabilities and their results are 

expoieia e^^^ to the RDBMS by aii appMcatio'u 

Xdbbrding to the preferred e&lbiodiinent, the data mihiHg process is in 
iterative approach referred to as a ^^^bM^dge Discoveiry^aly^ Process". 
(KDAP). There are six majoi^ tasM ^ithih t^d KDj^'- ^ * 

2:'. UnderstMtJKngth^sb^^ .br.,o 

3? ■ Selecting the datS set r - v 

4. Designing the analytic model. ^ - > 

5. Creating and testing the raodels. 

6. ' " Depldyifig^tfcfe' andyti^=^i^^. ' ' - 

'The present invetitidn provides V^idol tdmponents'foY^^ddressin tasks: -4 • 

• An RDBMS that executes StructurM^Q^ery iianguftg^^^^^^ 
' ^ ' ' statements a^kini^^^^^^^ ' ' ■ ' 

, ' An anal^c Ai$^iic^^ 

scalable data mining functions comprised of complex SQL ; 

statements. » 
' ' • ' ' ApJ)E[catibn pi^d^raihs thafelii^akttili^ aitid parameterize the analytic 

• ' Analytic algomhnisrHittli^ - • > r 

• Extended ANSI SQL statements, 
^ ' ■ Call Ii^¥in&rface (CX^ comprised of SQL statermenfe 

and prbgf&rmatic it^^ - - 

■ ■ a I^>ati: Reduction Utility Program comprised of SQL 

stitlsmems^ and programmatic kera^^^ ' 

• An kiiaiytical Ibgicd data model (LDl^ that Stores results from and 
' iniFoi^mation ^ibout the advanced analytic processing in the RDBMS. 

• A parallel deployer that controls parallel execution of the resuks of 
the analytic algorithms that are stored in the analytic logical data 
model. - ' - ' - 
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^ r^The* benefits of the present ^jvention include: \ . , . .^ ^ (^;> i 

• Data mining of very large databases directljT; 7R^ithin»a rel^uop^ 
'■' rnc'i:^'.. » database,: r th \- •: . • ''r-^^ --i-ioXr 

il 1- ^Management of analytic results within a relation^al data^as^. ^ . 
5 # ; n r A comprehensive, set c^^analytic pperations that Qperaie i^ithin a 

: . - ^' relational database n^agement sys^m. / ,h , . 

A :AppKcation integrati 
These components and benefits are descrSHdlin more detail below. 

10 1 • ; ^T^nWARBENVE^ ONMENT 

; i ' ■ V ; . : .rpiG.3l 'is a Mock di^^ ^n.thatallustmes::ajiiexeinplary computer hardware 
' ^virbnrnerittchat could beiviie^wiiehr^he prrfeired , embodiment .of, the p 
• ^ - iiaV4htion. Inbk j. exemplar)'- computer hardware environment, a niassiyely parallel 
-T' processing (^iEP)rc6rnputer- systenijl<QOis eqmprised cfone or more pi-ocessors or 
15 nodes 102 interconnected by a network 104. Each of the nodes 102 is comprised of 
^ "one bV'-niore processoi-s, rabdom afccess inlsmorj' (RAK^j read-only memory 

(ROM), audjother cdmponeIlts^ It is envision^ that, attachjgd to the nodes 102 may 
be oris or md?e^ fixed and/or removabh data storage units pSUs) 106 and one or 
more datk'coinMuiiicati6nsiu'mts pCUs) lOBj- as is well known in the art. 
20 Each' dfithc>^^6dei . ICl gxecutes^ one or mo.r;eicomputer programs, such as a 

Data Mining Applicatiioa (APPb) 110 perfor^amg datC iminiag operations, 
Advanced Analytic Proceisifig Components (AAPC) 112 fo.r providing advanced 
analytic processing capabiliti ;s : _ > die data minings-operations, and/ or a Relational 
Database Management System ^n^BMS)vU4ior manning a- relational database 116 
25 ' stored on one or more of the DSUs 106;foi;.uscfin the data mining applications, 
wherein various operations are perforined in the 'APPL 110, AAPC 112, and/ or 
" ' RDBMS 114 in response to commainds froni dne^pr more Clients 118. In 

alternative emlxSdimeiats, the APPL 110 may be executed in one or more of the 
Clients 1 18,- or bh an appUcation sender on a different platform attached to the 
30- network 104'. " ~ » 

Generally, the computer programs are tangibly embodied in and/ or 
' retrieved from RAM, ROM,- one or morfe of the DSUs 106, and/or a remote device 
- coupled to the computer kystem 100 vraioncor rmore of the DCUs 108. The 
' computer pi^ograms comprise instructions which, when read and executed by a 
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node 102, causes the node 102 to perform the steps necei5sary to execute the steps or 
el(etiiehts; of the pre^nt ihvem i 

Those skilled in the art will recognize that the exemplary environment 
illustMed W 'MGVi is not intended to Hitiit the present invention. Indeed, those 
5 skilled in the^ak will recognize that other alternative hardware environments may 
be used without departing from the $c6p€ of the present inisreation. In addition, it 
should be understood that the^piresfefit 4i4vention may aiso-^pply to other computer 
programs than thdise disclosed^ Iks^eatP.''- " r - 

^ ^ • FIG^ 2 is i block- dfagf am that lUus^^^ of 
■ xke A^^C'llil, and its iiiteAction with tiie M®L 110, RDB^N^-lUy relational 
^ dataliase 116, and Client llSracc^^^ 
' mA^mk>nJ Inthe pr^fer^ 
15 ' compoiients: ^ - " ■> ' ..^tTO^rj- ; ;-^x^vo-::./.>:- v: ;>'':^ :t 

' • ' ' -^^--An An^ytie Logics Data Model (LDM) 2Q0 that sto^rgs results from 
' ' ' ' the advaBced^iilalytfc processing in (the RDBMS IMt.^ ;1 . 

. , o ot mor^TSealabie Data Mining Fuoetioi^s 202 that 

u 2 ctomp^riie ^dtnplex^ 

20 * ' - adviiicfed-^illalytic pj'ocessing inithe-^^ ^ 

? :^ - - ; An A^iMytic Application P^dgramroiag InterfaGey(^ 204 that ^ 
; . : ; ^'r^Vides a mechanismii^riarD^yEPl/ 1 10 or other component to 
-1^ -/ 'invcyke theEScdable Ctota M i 
i^d ■ v^. \r:d\ ^Orie or in'^^re Analytic /Algorithms 206 fhat car;, operate as standalone 
25. ^ r v vn ^ applications or caii>be invoked^ by another component, wherein the 
Analytic Algbridims 206 comprise: . K . r . ^ 

' ■ ^ ExtfeiKled ANSI SQL 208 that cm. h§ used to implement a 
! <^r*airi class of Analytic Algorti:hm$ 2^^^ 
■ ^ ^ A Call Level Interface (CLI). 210 that can be used w;hen a 
30 combination of SQL and programmatic iteration is required 

- ^ ' toimplement>acertain clajSsof Analytic Algorithms 206, and 
ic > ■ J \ A Data Reduction-Utility Program, 212 that can be used to 
' • ' ^ ■ ' itnplement a ^cirt^ri class of Ap^Jytic Algorithms 206 where 
■yly.r '' I •}. ^ data is first reduced using SQL followed by programmatic 

35 iteration. 
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.1. j r ' An Analytic^Algprithm Application Prpgranumog.I^^^ (API) 

A -^.Z 214 that provides a niechanism for an APPL4.1Q pr p^^^ j 

- h'C-. ..components to^inyoke the Analytic Algorithiivs 206, . ? , , ^ 
• A Parallel Deployer 216 that controls paraUel eze/cptipns of t^^^ 
5 . v^rvresults of an Analjjic Alsorithni 206 (sometirnes re^ 

. . . 0 ■ ;ar.aiyti!c: model) tha«^are ^tpred in the Analytic LDM 200i;wherein 
n . ; , : the^aesdts Ipf execytingithcjParall^l Deployer 216 are store4,in the 
V.,- . 5aDBM&iU4:;M!.-' biblvo'.q . ■ ' 

:lr. . ; ' ^ANote that the i^ii.^f'^etese various gompgnents .is optional, and thus only 
10 some of the components may be jvised infany;partiai,lar co^jSguration. 
; . . ' r.p'fle .preferred^mh^dl.njgn^ is p^^ a multi-tier logical 

; ■ . architecture; in which a CMenplIlg iji^eiractS; with the varipus, ,Gpm|)pn.e9.te described 
aboy8;. which,5iriit-arn, interface tc>ihe:IipBMS 114 to utilize .a Urge. fe^tra^ , 
t repJository of ienterpijr-e data stored' :?i\ithe,relatipnal database 116 for.analyyic^ 
15i-''":"prOces5ing. ■»? ! b:-.'-.; ' ,j b''.;o;j:- s •■)-: .■ ■ . . •. 

In one exampieyiaiGlient 118 jintfer^MSts, with an API*L Up, which interfaces 
to the Analytic API 204 to invoke one or more of the Scalable Data Mining 
Functions 202, which are executed by lM;I^|¥^!ilif -/^ Tj^ 

execution of the S calableiDat^i Mining FunctipjO3^02.^ould be stored as an analytic 
20 model wiAin an Analytic !LDK(«aQQ. in the RDBM^444.j;^.; • ; ' . - ; r: i; : 
■In another example, a CUentrl 18 interacts lyith p?ie;Or more Analytic 
Algorithms 206 either directly Qirjvia; the' Analytic Algpij-jc^iti; API 214- The 
5: Analytic Algorithms 206 compris^^S^QL,, statements that may qr;m^y, ?iQt include 
programmatic iteration, and theiSQI? stajements are executed by the.RDBMS 114. 
25 In addition, the Analytic Algorithms 206 , nay or may not interface to the Analytic 
API 204 to invoke one or more of the Scalable Data Mining Fun^ons 202, which 
are executed by thfe RDBMS 114. Regardlessyj^&j^ults frpm the execution of the 
Analytic Algorithms 206 would.be stored as an. analytic napdel within an Analytic 
LDM 200 in the RDBMS 114.. - ; i .-i.: 

30 In yet another example, a Client 118 interacts with^the Parallel Deployer 

216, which invokes parallel instances of the resq^ts of the Analytic Algorithms 206, 
; sometimes referred to as an Analytic Model. The Analytic Mpdel is stored in the 
Analytic LDM 20O as a result of executing an instance, of the Analytic Algorithms 
' " • 206. The resuks of executing the Parallel Deployer 216 are, Stored in the RDBMS 
35 114. 
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( - ; - a Client 118 intei4cts with the APPL 110, which 

invokes tfie or more' Analytic Algorithms 206 eiithef directly or via the Analytic 
Algorithm API 214. The results would be stored as an analytic model within an 
Analytic RDBMS 114: - - ^ • ^ 

5 • Th^ Overall goal is to significantly impi'ove the performance, efficiency, and 

scsdabilit)^ o( data mining operations by |irerforming compmh'^mi/ or I/O intensive 
oi)era^ions in thfe various corripbneytS; pTeferrfed cfrabodiment achieves this not 
only through the parallelism provided by the MP? cbmq^yifter System 100, but also 
from reducing the amount fif BSttf that flows betwei^i^he APPL.llO, AAPC 112, 

10 RDBMS 114, Client li8;-tebthercorripdnen^/^->^^^ : • ^ i h> : 
Those sMlled in the art \^ill refcognize^hstftfafeiecce!^^ 
iliustfSteH^^'i~dis<^^^ m cbnjunction^witfiM©. 2 s^e not intended torliniit the 
pre^eht ihv^fitiony Indeed, thos^ ^killeS In^tke will recogaisDe^that other^d . 
alterhativib coh'figuratiihs miy -be used A^iti^ut de^ahij^ from' the scope ofjthe 

15 present invention. In addition, it should be understood that the pre^eat invertionM 
may also apply lio bther compdneii1S%hari those c^dlosed herein, -^j. iu 

" ^baiable Dai^ Mining t^u^^ ^ nr: rv-. - l ^; .. . 

' ^ ' - Tfhfe Scalable' I5ata'M^^^ 20^ comprise cioinplek, \ : 

20 optimized SQL stat^rrie^^ that ^re cre^tSd,riii -the preferred eiribddiment, by 
f)afameterizihg aifd4t^staiitiating the c^f^&ponding Analytic APIs 204. The 
Scalable Data^Miirftig Functions 202^p6ffornil much.of the advancied analytic 
proc^siitig foi'data iiiining a^ppliiJatJoiSs'isxV'hen >pe t^he RDBMS 

' I hV Without having to'movS'^diti^ffotfi the relational database:! 16. 
23 ' i • Scalable Data Mining^Fiih'dtibns^202 can be p ategordzed by the ; : 

follbwing^nctions: ^' a^d^ - t^i: r ; u o; t ; 

• ^ • Data Descriptidb : The ability to understand and describe the 

available^ diata using statistical techniques" JRor example, the 
generation of descriptive statistics^ Srequeniies arid/of histogr^im 

^ ^ bitist^*^ - '^^^ ' ' ■ ^ - ■ ^- 0. 

n i ' ^ ^ : Data Denvatidri : The abilit>^ to generate new variables 

- o c r • 'I ^traiisformatidris) based Upon existing detailed data when designing 
05 / , r an analytic model. Fci- example, the generate 

- ■ variables such^ bitniaps, ranges, codes:and mathematical functions. 
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: / :! . . .»:vv Data Redaction : The ability to reduce, the number of 'yariables 
• ' ^ - ' (colunms) o^r observations (rows) used when designing an an^ytic 
^ ' ^ modeL For example; creating Covariance, Corf elatio 

- ^ Squares and Gros^r^oducts ^SCP) Matrices^ i: or y i\ - 
5 • ' " ' T>ata Rfeorganizatibr^ ^^TIie ability to join or denprmalize pre; . 

' prScSsed results -int^^ ! : * 

• I^^^Sam]pUng/Partitioni:^ :\ The ability to intelligently request 
'"^^ . differenediita^s^ples or .dLC?t partitions. For example, hash data 

■ ^ ■ - - » > pairtitibiliig%r data sampling^: t''-'-' o ' . > 0 . . 

10^ ' ' ^ '^e principal th^He^of the Scalable Data Kat-^jang functions 202 is to / 

facilitate analytic operations within the RDBMS 114, which process data collections 
stored in the database 116 and produce resvJts that Slso are stored in.tJte database 
' iiiS/ S b^eratidhs tend to^be iterative and explanatory, the database 

JJc ^jjg-in^tiie prefer rfeC WiBodiment^feL^j^ises^^a combined borage and work space 
15 envirb'riment. " As su'cHra sequence of datsDmining operations is viewed as ;a set of 
' ste^s'^tiiaf start with sbfne colledtion of tlblesin the database Uj^r, generate a series 
" of intermediate^ork tables;^ 

■' ' ' Analytic Mi^drithihS ^ ^^^^^*-' ' ' ^ ^ ^ --carru: i>a.; . j -'b^il'j 

20 The Analytic^ AlgoritKm^ provide statisv-^^al and "machine learning" 

methods to creat'e Ansilytic 'L?bJi/2^ 200 from the datk residing in .the relational 
database 116. Analytic Algorithms 206 that are completely idata driven, such as 
association, can be implemeht^d^Solfelj^ iiL^E^ SQL 208.- Analytic 

Algorithms 206 that require a cbmbiliaiion^of SQL and .progranamatic iteration, 
25' ' such as ifaductibn, can be impleinentea ubing the CLI 210f?rFinally^ Analytic 
' ' "Algonthiii^ 206 that require almost coniplet^vprogrammatic iteration, such as 
clustering, can be implemented using a Data P».ec action Utility Prograni 212, 
wherein this approach involves data pre-processing that reduces the amoimt of data 
that a non-SQL algorithm can then process. In 71 1:-. 

30 The Analytic Algorithnis 206^significantly improvt^ ^he performance and 

efficiency of data miniiig bperaticns by providing the technology: components to 
perform advanced analytic operati^E^iis directly against the RDBMS^IM..^ In addition, 
the Ahalytic Algorithms 206 leverage the parallelism that exists in the.MPP 
icbinputer system 100, the RDBMS 114,^ and the database 116. 
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yx>hi- The Aaalytic A%orit&ins.206 provide data Analysts with an unprecedented 
' dptidil tb.tr^ iancl apply "machine learning" ^glytics against massive amounts of 
data ifi the relktidnal database 116. Prior teehjiiqvies have failed as their sequential 
design is not optimal in an RDBMS 114 environment., J^ecau?? the Analytic 

5 Algbrithifii 206 We implemented in Extefnded ANSI SQL 20|^ jthrough the CU 
210, and/or by means of the Data Reduction UtiUty Pr,ograju^l2, they can 
therefore leverage the' scalability avajlableioa tbie>p»P fi^^ system 100. In 

addition, taking! a data-drivenl appfoadi to analysis, iij^roMgh the use of complete 
Extended ANSI SQL 208, a lows Reople other th^ J^iyhly educated statisticians to 

10 leverage the advartdefi ^aS^c;iedhnit2|ues offl^eijjbyithe Ai^alytic Ajigorithms 206, 

■ ? Exteaded ANSI SQL ^' .irbo-io ba^^ .C . ..j-.r.ryb t'd:. / f--;n.v£ 

. -i - As Mentioned above, Analytic Afeoathtos 206 j^at jag^e^cc^plejtely c^a^a 
driven, Siichi^as .affinity analysis, can be^itijplemenyed s^kjlir in ExtendfM .^SI SQL 

15- 208;' Typic^lyVthestf type of alg6rithsis>^>2rate agaiiis|: a set/of tf^feks,i^^T^,e . 

•■' i-ektional databiise 116 that are poflBSat^d with pramactipn^evel daj^ t^ source of 
which eotild bfe point-of-sale devices, automated teJJ/ef machine;SvOail,center§, the 
Internet, etc. The SQL statements used to process this data typically build 
relationships between and among data elements i^ thi^ l^^JbJies. Fpr example, the - 

20 - SQL statements used to p?dcess data from poij^t-jJ^frSfJe d^ces, build 

relationships beigRreeii and among products ^©cJlEPi^s of products. ^Additionally, the 
'dimension of tim6(tan be added in such «W7ay ithat these relationships ca^ be 
ahalyi^ t^-d^rinine how they. diajageM/j^ritime. As the implementation is solely 
in SQL statements,- dieidesigh takes'jidyjafelf^ge of the hardware and spftw^e 

25 ehvironmfentiof the preferred.endipdir4*ent by. decomposing the. SQL statements , 
into a plurality of sort and mer^ steps that j::a% be executed concurrently in parallel 
by the MPP com!puter systei^; 100. y - , .. . ^ ^ 

•■~ '■'f'^'^'^' ' --^'J' ' ' : ^ ur, ..:v . .1 :.>■ 

Call-Level Interface , 4 t :.- . : ' ; i . , ;. 

30 As mentkmed above. Analytic Mgpriifhtns 206 that, require a mix of 

prograiftmatic iteration along with Extended, ANSI SQL statements, such ^ 
' inductiife ihfei-eiice, can- be implem.ente4,;^ing the CLI 210. Whereas the SQL 
aJ^roachis appropriate for business problems that are descriptive m Pfture, 
inference problems are.i)rediGtive in njature and typically require a training phase 
35 where the APPL 110 "learns" various rules based upon the data description. 
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followed by jesting andfapplication, and where the rules are validated and ^pplied 
against » nev data set,, This.qlass of algorithms are compute-intens^ye aijid 
historically can not handle large^yolumes of data because they appect the analyzed 
data to b6 ika sp^dific -fixed or ^variable Apt file format, . ^ 
r Most implementations first.^xrract the data from the database 116. to 
construct a flat il^fandithen execute.th,^ "train" portion on this resjdtant ffl^^^^ This 
- mbthod is slow and limited h^y the^ mioym of pemopr available in the coipputer 
systemilOO. This processjcan be improyed ^5^)^eraging the relational database 116 
to perform those portions of the analysis, in^t^pd pf ex^ all the data. 

When SQL statements and programmatic iteration are used together, the 
RDBMS 114 can be leveraged to perform computatiorK smd orde^^ data within the 
r relational' database 116, and .then .^^ra,ct thie inforpatibn using~yer^.lin^^ memory 
i> in the, APPL li^.Ofp Additionally, .coinp^tation5,^aggregations an^^^^ can be 

-inin^iri paralM; becawsis pf the mas%€^lj, pp-aUel nature of the RDBMS 114. 

■■■*. -• r;? ■ .'i ■ ... . . ; .. n , h-.^ . 
r lOataReductio irf^TJtilitv Program, ^r- i - .>\^ ,.;> 

As; mentioned aboye'^ Analytic Algorithtms 206 that can operate on a reduc 
or scaled data set, such as regression or clustering, the Data Reduction Utility 
Program 212 . cah b^:Wsed. ,jrh§ problem of cr^^ti9i| analytic naodels jFrom massive 
amoimts of detpled diita ,bas oft^ been addressed^b^^^^Ung, maiiJy because 
compute intensive algorithms cannot handle l^ge y^lum^s of data;^ 
of the Data Reduction UtiUty ifffgi: W W ^° reduce dataihr|;^ugh operations 
such as matrix calqulajipfls or hisjCogr^|binning, and then us^ .this reduced or 
scaled data as input to a^on-SQL algpjithm,,;This method mtentionally reduces 
fine numerical .(kta details by assigniAg jJ^f %to rai|ges, or bins, correlating their 
values or detenhining their ,covariancesoThf £apacity of th^ preferred embodiment 
for creating these data^ structures from massfy^ !i??r^** of data in parallel gives it a 
special opportunity in this area. 

■ :o ■■. . ' :,rj-' i/.i ' ,■ h ' 

Analytic Logical Data Model .;. :J^r 

The Analytic LDM 20pi which is integrated with the relational database 116 
and the RDBMS 114, provides logical entity and attribute definitions for advanced 
i analytic processing, .i.e., the Scalable Qat^^ Mining Functions 202 and Analytic 
Algorithms 206, performed by the RDBlvlS.l/H tlirectly against the relational 
database 116. These logical entity and attribute definitions comprise metadata that 
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dSne tlie pkar«iemM of dita stored in the rekiibhial database 116, ^5 weU as 
met^^ca tKat determines how the RDBMS 114 performs the advanced sjialytic 
processing! T^^^^^ LDM 200 also sftores^processing residts from this ' 

advanced analytic processing, which' intludes both result tables and^ derived data for 
the iscLiabletea Mining Functions 202; Aiisiiytic AlgbrithrtS 206; aiid the Parallel 
Deployer'ii^. 'the Analytic LDM 266 is a dynamit model;;i^fee the logicalcentities 
arid a-ttriijutes definitions ctaiige d^ij^fidirig updh param^erization of the advanced 
analytic processing, and'smcVtE^' ASiaiytic LDi^ 2GQ 4a ^t^pdsated with the results of 
the advahced analytic prbcesiiiiig:" ' ^ - '^-^^^ ' . • -1 > 

Lo gic of^feP^^rfeaErnbodirifent l ' ■ '■ - ^^^i :'a.':.Ax': 

Flbwchirts wliich' illustrate the Idgfc M'the ipreferred eeibodiment df the 

pr^eiit'myemiori ire proviHed'in aild 5. Those sl£illed in the ah will j, 

recognize that thik logic is prb^ded fti?!Hiistrative puipdS^ 6)nly and -thiti iifferent 

logic may be used to accomplish the same results. 

Referring to FIG. 3, this flowihart illiisti'at^lthe lbgic di thsf^Scalable Data 

Mining timctions 202 accdfdiflfg'to the preferred^feiftbodiment of th^ present 

mvention. 

' ' Block iop repf eseat^^thedne ot mbre of the SeaiaMe'Data lylining Functions 
202 being created via'me kf*I 204. This may fehtiaii; for example, the instantiation 
of an dbject'providing the desired funcudh.-' ^ iii .- r.- . . 

Block '302' represents certain paiaiireters: Being passed to the APL204, in 
prde/to dontroithe operation df the ScMabie' Data Mining Functions 202. i. 

\ ' ■ ' "BldM' 304 represents the 'ni^tiiu£^a in the AriaiyticiDM 200 being accessed, 
; if 'necessary for the operatioh-of tiie Scalable D^ta Mining Function 202. ; ■ 'l 

' Block 306 represents th^^^AJPr204 generating a ScalabieData Mining ■ 
Function 204 in the fotrii bf a data' miiiing query based dn the i>a5sed;parameters 
and optional metadata. ■ . : <l-a, ■ . . - i , ■.. . .. ^-^ v 

Block 308 represents the Scalable Data Mining Function 204 being passed to 
the RDBMS 1 14 for execution. ' - K ?^=' -= / : ^ 

^' Referring to FIG. 4, "thik flowchart ilitetr4tes the logic of the Analytic 
Algorithi^ 206 acc6Mn£ to the preffer^ed embodiment of the present invention. 
' * ■ Block 400 represents the An^iytrc Algorithms 206 being invoked, either 
' directly or via the Analytic Algblrfthii API 214. - 
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.r' bii; ; Block 402 represents certain parameters being p^ed to tie Analytic 
Algorithms 206, in order to control their operation. -f • . , ^ 

. . iBlock 404 represents the. met^ta in the Analytic IX^M 2CX) bein accessed, 
if necessaryrforithe operation of tlie, Analytic Algorithms 206., , ^ 

Blocfc 406 represents the Ajfial)rtic Algorithms 206 passing S^L statements 
to the RDBMS. U^Jor execution and^Block 408 optionally represent iJ^ie Analytic 
Algorithms 206:.perffirming prpgramram<: iteration. Those skilled in the art will 
recognize that the s^nuence of thes^ step^ m^y^di^^r from those described above, 
■ ^may not include bb^ stepSi may include add4t|Qft|J^^5?e^^ and may include 
iterations of these steps. 
. : . - rBlotli^AlO represents jfe?iAnalytic Al&orithiii§ 206 sto results in the 

t ^Analytic LDM 200, ' ■ h aox^Kv: : • - . -u- -- ^ • -.o i : .>:iv-vK.f . . 

< ' iReferrL^g;tQFIG..5^ this flowchart illustrates the Logic perfprmed by the 
RDBMS 114 accorf??ag to die prefe^-rg^J^mbodiment of the .present invention. 
/ L >^:, IBlock SOO'repjresentSjthe RDBl^vS l^H rep^iying a query^ or other SQL 

statements. y;; • .■^;t . ^. ^ - ~' .* tr^-^ -f 

Block 502 represents the RDBMS 114 analyzing the query. 
Block 504 represents the RDBMS 114 generating a plan that enables the 

RDBMS 114 to retrieve the correct information from the relational database 116 to 

satisfy the query. 

Block 506 represents the RDBMS 114 compiling the plan into object code 
for more efficient execution by the RDBMS 114, although it could be interpreted 
rather than compiled. 

Block 508 represents the RDBMS 1 14 initiating execution of the plan. 

Block 510 represents the RDBMS 114 generating results from the execution 
of the plan. 

Block 512 represents the RDBMS 114 either storing the results in the 
Analytic LDM 200, or returning the results to the Analytic Algorithm 206, APPL 
110, and/or Client 118. 

CONCLUSION 
This concludes the description of the preferred embodiment of the 
invention. The following describes an alternative embodiment for accomplishing 
the same invention. Specifically, in an alternative embodiment, any type of 
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computer, such as a maiiifrairi^. 'minico personal computer, could be 

used to. implement the present invention. 'o:,; . i . . t.- m . j . 

^ • hi summary;^ the present 'invention discloses a inethodi apparatus, and article 
of manufacpire .for perfofmiiig data minln^applieations in a relational database 

5 n^nageWent^^kem. At least one analyfil: Algorithm is pefforcied.^yi'a computer 
direcfly against a relational database, Wfiekln'the'anadytieygdrit^ SQL 
st^^teiiients performed by the reMoik? database itijmagasa^q^^ xjptional 
programmatic iteration, andi i^e'^aSiayuc aTgorithin bfea^s at least one analytic 
model within an analytic l^pc^ii^& liioderfrom d«*' residing in the relational 

10 database. ^ ■ i > irr::y 

" TheforegofngdelcrifinonoiFthe'pre^^ 
been presented for the purposes of illustration and descriptJoh.isIt is not intended to 
blexliausuve 6r tS 1^ the inyeidtion td'icfife prfefciSe fc^riitdi^cJbsed. Many ^ 
modifications inil variatiottsVe'i>osSiyig^lig^t of thcf.alfeve teaching.] It is * 

15 "intended that the'scope M th6 inVentK^tt b?liifiited mt by this detailed description, 
but rather by the claims appended hereto. ; , ; 

■'•■'■^"■^ - --1 luiS^-i!.' Li 1.:. ..rm '-^ 
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IS CLAIMED IS: c r-.- ■ ; ' s 

f 1 r-A, computer-implemented system for perfprr^g data nuning 

applications; comprising: • : / 1 /f'^ .. it L-t;^^ - ; ^ .j^.^;; 

(a) a computer having one or more data storage devices connea^d thereto; 

5 (b) a relational database management system, executed by the computer, for 

managing a relational database stored pn-the data storage devices; and 

'"^ " ' ' (c) at least erte analytic algorithm^ by the computer, wherein the 

analytic algorithm includes SQL statements perfqrpaed by the relational, database 
management system directly against the relational database and optional 

10 ^ p^ogram^ analytic algqrithmi creates at^le^^ one analytic 

model ^witiuiil an analytic logical4ata model from data/^idingin the relational 
database. f: t; rj:. r :;j b-;:':^Mv. 

i ^ i : y b .2. 1 The^'bmputer-imple^r^^^ 1, wherein the analytic 

15 algoiith^ prdvidesfstatnstical and macfcn^ilearping method§ for creating the 
analytic logical data modal.: ' ;^ ii? - , ■ 's^j/r, r t :r -^v::.-^. . o 

- - ■** ' - vr.. .6 1.^;^- • ; ,v.> : 

3. The computer-implemented system of claim 1, wherein the analytic 
algorithm is implemented in Extended ANSI SQL. 

4. The computemmplementgd .system ^f .cl^in 3, wherein tl^e analytic 
' ^gorithm operates against atset ofut^bles inithe relational.d^tabase, and the 

Extended ANSI SQL build relationships antong data element in the.tables. 

25 V: 5i The cbmputer4niplem„entedi system of-claim 4, where^^ 

ATsjSI SQL' analyzes^ the relationships to jd^ter^ne how the relationships change.. 



£ , ^ 



- f6i i " The computerrimiplemented system of daim 1, wherein the analytic 

algorithm is implemented in a Call Level Interface (CLI) that processes data from 
30 the relational database using SQL and programmatic iteration. 

7. The computer-implemented system of claim 6, wherein the CLI is 
used with SQL to perform computations, aggregations, and/or ordering on the data 
from the relational database. 

35 
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8. The computer-implemented systerii bf claim 1, wherein the analytic 
alpitiifh isUift^leriiebted by a Data ReductionUtility Program that reduces data 
from the relational database in bulk using SQL followed by a noij-SQL iterative 



program.. 



■■9.:-nr:c 



g*: ' The computer-impienifeiit«l ssystem^of cliimjSi^herein the Data 
Muction UtiUty ProgrSai pir<D^4des"S'iequence of Extended ANSI. SpL followed 
by pl^ogrammatic iteration.' ^ i --.onj ,n j ^ir 

10. A mfetH6tf^c^f'^eifern^iingidatai^finM^ 
■ ' ' (k) mana^n'g^ teiatioftd database kottdgaiio^^ 
connected to a computer; and ■ ; 

(b) performing at least one analytic algorithm in the computer, wherein the 
aiiaiytie algoritlim includes SQUstateifiients performedrby ai rislational database ; 
management ^st^rii directly agaihsiithe ^relational database 'smd optical;, . 
programmatic iteration, and the analytic algorithm-create at leaM r^ne analytic 
model within an analytic logical data model from data residing in the relational 
■datibase?^'' ' ^ '■^■'^-•^^^-^^ '■'■•^''- '-iour,,^:..-: ..rC 



11. An article of manufacture comprising logic embodying a method for 
pirfoirming data-mlfiiiiig applications, coik^rising: ( ; :, r 

(a) nlaria^ng a rdktional databasfe'istotsad on onejor.jnore data storage devices 
'ronnectedtoia^-cbniputer; and'^-^ v:-ii;if:o.;i; ' ^ ' .>C.i^ )' .'r^ >>^.. . ■■ '^y 

(b) performing at least one analytic algorithm in the computer, wherein the 
^aiiilyiic algorithm includes SQI^^tltfements [performed by a relational database . 
iiyiagemeht system dirfeetly sgkinst the relational database and oppond . . 
programmatic iteration, and the analytic algorithm creates at least one analytic 

' model within an analytic logical data model from datarresiding in the relational 
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